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1 

MOLECULAR MODELLING AND DRUG DESIGN 



FIELD OF INVENTION 

The present invention relates to a method for assessing the 
absolute free binding energy between a host or receptor 
5 molecule and a chemical substance interacting therewith, e.g. 
bound thereto, and to a method for assessing the relative 
free binding energy between a plurality of systems comprising 
a host or receptor molecule and a chemical substance interac- 
ting therewith. 

10 BACKGROUND OF THE INVENTION 



Computational chemistry and molecular modelling have become 
an essential part of the modern drug design process. This 
type of methodology is now being used as a complement to 
experimental biochemical studies and 3 -dimensional structure 

15 determination by crystallographic or NMR methods. The most 
important applications of computational modelling in drug 
design comprise (i) methods for finding new "lead compounds 1 ', 
(ii) interactive computer graphics for modifying and manipu- 
lating the chemical and geometrical structure of Inhibitors 

20 and (iii) subsequent energy and structure refinement using 
molecular mechanics (MM) or molecular dynamics (MD) calcula- 
tions (For reviews, see Cohen et al., 1990, Dixon, 1992). 
Until now, however, molecular modelling methods can only 
provide a certain extent of qualitative information on the 

25 affinity of various compounds for at given target site. At 
best, some tentative binding candidates can be proposed and 
perhaps some compounds can be excluded which do not seem to 
match the target site well. This may be done by considering 
molecular shape and electrostatic properties, but the quant i- 

30 tative discrimination between different ligands in terms of 
actual binding constants must usually be left to the experi- 
mental work of organic chemists. The reason for this is that 
it is extremely difficult to theoretically predict relative 
or absolute binding constants, i.e., free energies, for all 
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but the very simplest cases. The only available approach to 
this type of problem today is the so-called Free Energy 
Perturbation (FEP) technique (for reviews, see: Beveridge and 
DiCapua, 1989, Jorgensen, 1989, Straatsma and McCammon, 1992) 
5 which in practice is limited to such "small perturbations" 
that it cannot realistically be used for molecules of the 
sizes which drug compounds represent. 

If a typical case is considered where it is desired to deter- 
mine the relative free energy of binding between two com- 
10 pounds, A and B, the problem is described by the thermodyna- 
mic cycle: 

A(w) ■* B(w) 
MhartlA) * * AG bilsd iB) 

15 A(p) - B(p) 

where MGj ol and MGf OI denote the differences in solvation 
energy between A and B in water and in the (solvated) protein 
site, respectively, and the AG^^'s are the corresponding 

20 binding energies; Ag^ nf ,(B) - AG bind (A) » AA<^ oI - AAGJ o1 . The 
same cycle can be used to determine the absolute binding 
constant of B if A denotes a nil particle. In that case 
AG bind [A) m 0 and the binding energy is obtained as the 
difference between the absolute solvation energies for B in 

25 water and in the protein. With the PEP approach, the free 

energies associated with the two unphysical paths A(w) -» B(w) 
and A(p) -* B(p) are calculated, corresponding to a "mutation" 
of A into B (or the creation of B in the case where A is a 
nil particle) . MD (or Monte Carlo) simulations are used to 

30 collect ensemble averages along the paths, which must be 

rather fine grained in order for the free energy to converge 
properly. If, e.g., two enzyme inhibitors are considered, the 
path connecting them will involve changes in the molecular 
charge distribution as well as the creation/annihilation of 

35 atoms. Especially the latter type of process converges slowly 



W 95/06293 



PCT/IB94/00257 



3 

and thus, if the A and B "states" are very different, it will 
be difficult to obtain convergent free energies even with 
present day computer resources (Pearlman and Kollman, 1991; 
Straatsma and McCammon, 1991; Mitchell and McCaramon, 1991) . 
5 It should also be noted that most of the computing time in 
FEP is actually spent on uninteresting configurations that 
correspond to a "mixture" of A and B. Moreover, if large 
conformational changes are involved, this always poses a 
major problem. Hence, the method only really works well when 

10 A and B are rather similar and, in fact, all applications 
related to drug design that have been reported fall within 
this category (e.g., Wong and McCammon, 1986; Bash et al., 
1987; Brooks, 1989; Merz and Kollmann, 1989; McDonald and 
Brooks, 1991; Rao and Singh; 1991; Tropsha and Hermans, 

15 1992) . The largest "perturbation" to which the PEP method has 
been applied is a hydrogen to hexyl group transformation 
reported by Merz et al., 1991. Drug design in practice, 
however, often deals with relatively large inhibitors that 
differ considerably from each other so that neither relative 

20 nor absolute free energies can be obtained with the FEP/MD 
method within reasonable computing time. 

Thus, there is an urgent demand for a method for assessing 
free energies of binding between host or receptor molecules 
and chemical substances of sizes relevant for drug compounds 
25 without the necessity of synthesizing the chemical substan- 
ces. 

BRIEF DISCLOSURE OF INVENTION 

The present invention provides a method which makes it pos- 
sible to assess free energies of binding between a "host or 

30 receptor" molecule and a chemical substances, such as a drug 
candidate, interacting therewith, without the necessity of 
synthesizing the chemical substances. Thereby, the time, 
efforts and costs involved in arriving at realistic drug 
candidates which it is justified to actually synthesize can 

35 be dramatically reduced, and the researchers involved in the 
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drug development will have a greater degree of freedom in 
their work during the early and intermediate phases of the 
development work. 

In the present context, the term "a host or receptor molecu- 
5 le" should be understood in a broad sense as any molecule 
which can interact with a chemical substance, and the inter- 
action of which with a chemical substance or a group or 
plurality of chemical substances, e.g. drug candidates, is to 
be studied. Thus, the host or receptor molecule may simply be 

10 any another chemical compound capable of interacting with the 
chemical substance, but most often, the host or receptor 
molecule will be a relatively large molecule, in other words 
a macromolecule such as a protein or an oligonucleotide, 
which is relatively large compared to the chemical substance; 

15 although the chemical substance interacting with the host or 
receptor molecule may, of course, in itself be a 
macromolecule . In many cases relevant in practice, the host 
or receptor molecule is a relatively large molecule of na- 
tural origin or prepared by recombinant DNA technique and 

20 having a particular biological function, e.g. as an enzyme, 
an antigen, an antibody, a biological receptor, etc., and the 
chemical substance is a synthetic substance of a structure 
known or believed to interact with or bind to the host or 
receptor molecule. 

25 Contrary to the Free Energy Perturbation technique, the 

method of the invention does not involve "mutational paths", 
but rather determines the free energy of binding by an ap- 
proximation, suitably a linear approximation which only 
involves an average interaction between the chemical subs tan - 

30 ce and its surrounding. The interaction (or potential) energy 
between the chemical substance (in the following often refer- 
red to as the "drug 11 ) and its surrounding is divided into a 
polar (electrostatic) and a non-polar (hydrophobic) contribu- 
tion, and the absolute free energy of binding is assessed as 

35 an adjusted combination of these two contributions. 



WO 95/06293 



PCT/IB94/00257 



5 

Each f these contributions to the absolute free energy of 
binding is assessed as the difference between two distinct 
states, A and B, of interaction between the chemical substan- 
ce and its surroundings which define the binding process, one 
5 state (A) being a state in which the chemical substance is 
surrounded by solvent, and the other state (B) being a state 
in which the chemical substance, interacting with (bound to) 
the host or receptor molecule, is surrounded by solvent. 

Thus, the method of the invention for assessing the absolute 
10 free energy of binding between a host or receptor molecule 
and a chemical substance comprises: 

1) assessing the average energy difference, <AVf* ff >, defined 
as <V±* g > B - <Vfi ff > A , between the contribution from polar 
interactions to the potential energy between the chemical 
15 substance (denoted i) and its surroundings (denoted s) in 

two states, one state (A) being where the chemical sub- 
stance is surrounded by solvent, the other state (B) 
being where the chemical substance, bound to the host or 
receptor molecule, is surrounded by solvent, 

20 2) assessing the average energy difference, <AVjf£>, defined 
as <vJ?£>b - <vy*£> A , between the contribution from non- 
polar interactions to the potential energy between the 
chemical substance (denoted i) and its surroundings 
(denoted s) in two states, one state (A) being where the 

25 chemical substance is surrounded by solvent, the other 

state (B) being where the chemical substance, bound to 
the host or receptor molecule, is surrounded by solvent, 
and 

3) calculating the absolute binding free energy as an adjus- 
30 ted combination of the two above-mentioned average energy 

differences . 

According to one embodiment of the present invention, the 
adjusted combination of the above-mentioned energy differ en- 
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ces comprises about one half of the value of the polar bin- 
ding energy difference between states B and A, AVf^ fl , pre- 
ferably one half of this value, in accordance with the linear 
response approximation, e.g. Marcus' theory of electron 
5 transfer reactions {Marcus, 1964), to which is added the non- 
polar (hydrophobic) contribution, <AV?^>, adjusted by means 
of an empirical parameter. 

Thus, it has been found that the above-mentioned combination 
of the average energy differences can productively be assumed 
10 to contribute to the free energy of binding according to 

AGbind - *<*Vi 1 - s > * U> 

where the A's denote the difference between the two states, A 
and B, which define the binding process. That is, A repre- 
sents the "drug" free in solution and B represents the case 
15 with the "drug* interacting with the solvated protein. 

Thus, <AVf^ 5 > = <vfi s > B - <Vf^> A where < > A , < ^ denote 
molecular dynamics averages calculated for the corresponding 
state. Equation 1 can thus be written 

^lad-«(<^ fl >B " <*i-s>A> + «<<V&>b " <*&>a) 

20 Each of the four averages can be calculated by standard 

molecular dynamics procedures using suitable computer soft- 
and hardware. 

In the mathematical equations herein, the symbol < > means 
molecular dynamics average. The index i-s means compound - 

25 solvent (or compound- surrounding) , the letter "i" standing 
for "inhibitor", which reflects the fact that the relevant 
compound or drug will often be a compound or drug which is 
intended to inhibit the function of the host or receptor 
molecule. The superscript "el" designates the polar or elec- 

30 trostatic energy, while the superscript "vdw" indicates "van 
der Waals", another designation for the non-polar interac- 
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tions. The symbol A indicates that the quantity in state A is 
subtracted from the quantity in state B. 

Considerations in Connection with the Development of a Pre* 
f erred Embodiment of the Invention 

5 As a starting point was taken the linear response approxima- 
tion for electrostatic forces which for polar solutions as a 
result yields quadratic free energy functions in response to 
the development of charges. This is, e.g., the familiar 
result from Marcus' theory of electron transfer reactions 
10 (Marcus, 1964) . For a system with two states, A and B, given 
by two potential energy functions and V B one obtains, 
within the approximation of harmonic free energy functions of 
equal curvature, the relationship (see Lee et al., 1992 and 
references therein) : 

15 X - <V B - V A > A -AGrb « <V A - + AG^ (a) 

where AG^ is the free energy difference between B and A, X 
the corresponding reorganisation energy and < > A denotes an 
average evaluated near the minimum of the potential i. Thus, 

AGnj « X(<AV> A + cAV^) (b) 

20 where AV now denotes the energy difference V B - V A . If the 

hydration of a single ion is considered, this can be shown to 
give AG-jJj - ^ c ^ie > ' ** e * t^t the electrostatic contribution 
to the solvation energy equals half of the corresponding ion- 
solvent interaction energy (Warshel and Russell, 1984; Roux 

25 et al., 1990). Returning now to the inhibitor binding pro- 
blem, this result may be exploited as indicated in Pig. 1, 
For each solvation process, i.e, solvation of the inhibitor 
in water and inside the protein, two Btates are considered 
where the first has the inhibitor molecule in vacuum and a 

30 non-polar cavity (given, e.g., by Lennard- Jones potential) 
already made in the given environment. The second state 
corresponds to the intact inhibitor molecule surrounded by 
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water or the solvated protein- The linear response approxima- 
tion will then again give that AGgJ^ « tf<Vf* ff >, where Vff fl is 
the solute -solvent electrostatic term. Hence, the electrosta- 
tic contribution to the binding free energy can be approxima- 
5 ted by AGg^ « %<Vf: 8 > (where the A now refers to the dif- 
ference between protein and water) and thus obtained from two 
MD simulations of the solvated inhibitor and of the inhibi- 
tor-protein complex. 

The validity of the linear response results in the case of 
ionic solvation has been confirmed, e.g., in the study by 
Roux et al. (1990) . Some additional calculations were also 
performed on simple systems that corroborate the approxima- 
tion of equation b. These tests were carried out by comparing 
the free energy obtained from FEP/MD simulations of charging 
Na + and Ca 2+ ions in a spherical water system (Aqvist, 1990) 
with the corresponding <vf^> from 75 ps MD trajectories. 
This yielded factors relating <Vf^> to AGg* a of 0.49 for Na + 
and 0.52 for Ca 2+ , both values being close to the predicted 
result of M- A similar test on the charging of a methanol 
molecule, given by the OPLS potential (Jorgensen, 1986) in 
water gave a AG|J 1 /<vf^ B > ratio of 0.43. 

A crucial question was how to account for the contribution of 
non- polar interactions and hydrophobic effects to the free 
energy of binding which was termed AGgf£ d . In the ideal case, 
25 it should be possible to estimate this contribution from the 
non-polar (or van der Waals) interaction energies. The liquid 
theories of Chandler and coworkers (Chandler et al., 1983; 
Pratt and Chandler, 1977) have been successfully used to 
analyze hydrophobic effects and to calculate free energies of 
30 transfer for some non -polar molecules (Pratt and Chandler, 
1977) , but no analytical treatment of that kind seemed pos- 
sible for solvation in an inhomogeneous environment such as a 
protein's active site. However, it was noted that the ex- 
perimental free energy of solvation for various hydrocarbon 
35 compounds, such as n-alkanes, depends approximately linearly 
on the length of the carbon chain both in their own liquids 
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as well as in water (Ben -Nairn and Marcus, 1984) . MD simula- 
tions of n-alkanes solvated in water (Pig. 2) and in a non- 
polar van der Waals solvent (not shown) nave been carried 
out, which indicate that also the average solute -solvent 
5 interaction energies vary approximately linearly with the 
number of carbons in the chain (the relationships being 
different in different solvents, of course). It would thus 
seem possible that a simple linear approximation of ACjffi^ 
from <AVj?£> might be able to account for the non- polar 

10 binding contribution. For instance, if <r is considered some 
appropriate measure of the size of the solute and if the 
solute- solvent van der Waals interaction energies and the 
corresponding non-polar free energy contributions (both in 
water and protein) depend linearly on a, such that 

15 <V% iw > « cL p o, <v£ dfcr > « cvr, AGg*" - fi p o and AG^ - jS^ff then 

AGZ^ d « fy~h<Av7*?> is obtained. Since it seems difficult to 

derive a factor relating the two quantities in a reliable way 
from purely theoretical considerations, the approach was 
taken by the inventors to empirically try to determine such a 
20 relationship which is capable of reproducing experimental 
binding data. Thus, the free energy of binding is in one 
embodiment of the invention approximated by 

M bind ~ *<AVfV + <*<AVJ?£> (l) 

the parameter a being determined by empirical calibration. 

25 Although, as discussed above, a theoretical prediction of the 
coefficient for <AVf^> is H, it may be practically useful to 
also treat this coefficient as an empirical parameter. This 
would lead to the free energy of binding being approximated 
by 

30 AG biQd - j8<AVf* fl > + a<AVj^> (lb) 

where both parameters, of and 0, are determined by empirical 
calibration. 
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DETAILED DISCLOSURE OP INVENTION 

The assessment of the averages is normally performed by 
establishing 3 -dimensional models or structural representa- 
tions, using, e.g., suitable standard computer hardware and 
5 software, comprising a 3 -dimensional structure of the recep- 
tor molecule alone and a 3 -dimensional structure of the 
chemical substance "docked therein", and applying molecular 
dynamics calculations to the 3 -dimensional representation. 
Force field data for use in the molecular dynamics calcula- 

10 tions may be from any suitable force field such as publicly 
available force fields, e.g., AMBER (Weiner et al., 1986), 
CHARMM (Brooks et al., 1983), GROMOS, OPLS (Jorgensen, 1986), 
MM2 (Allinger, 1977) , etc. The purpose of the molecular 
dynamics calculation is to be able to explore the available 

15 conformations of the system, thereby calculating the average 
interaction energies. In the simulation, the molecules are 
allowed to move around to enable exploration of the confor- 
mational space. 

The establishment of the 3 -dimensional structural representa- 
20 tion may be performed using any method which will result in 
the establishment of the 3 -dimensional coordinates of the 
molecule or combination in question, including crystallisa- 
tion and X-ray crystallography, NMR, computer modelling, etc. 

In the present context, the term "docked therein" indicates 
25 that the chemical substance has been brought to "fit" with 
the receptor molecule in the 3 -dimensional representation; 
while this does not, in the present context, imply any nu- 
merical limitation with respect to the quality of the 3- 
dimensional fit, it is evident that the binding free energy 
30 resulting from the methods will reflect the degree of "fit". 
However, molecular dynamics applied to the system comprising 
the receptor molecule and the compound docked therein will in 
itself be useful for checking the correctness of the docking 
in case more than one position and/or more than one orienta- 
35 tion is possible. 
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The calculation of the above-mentioned energy averages can, 
in principle, be performed separately, either manually or 
(preferably) by means of a computer. In both cases, the 
calculation may be performed on the basis of the well-known 
5 classical mechanical principles involving simulations on the 
equations of motion of the relevant molecular systems (mole- 
cular dynamics or MD) ; another principle could be the so- 
called Monte Carlo simulations, which does not actually solve 
the equations of motion, but rather calculate the probabili- 

10 ties of different conformations; a still further principle 
could be energy minimization in which the averages are repla- 
ced by minimum energies. Neither of the two last -mentioned 
principles are, however, contemplated to be as efficient as 
molecular dynamics. On the other hand, it should be under - 

15 Btood that whenever reference is made to molecular dynamics 
in the following, the two other principles mentioned could 
also be interesting. It is preferred to perform the calcula- 
tions using suitable software implementing the principle in 
question, preferably software which interfaces or communica- 

20 tes with the stored coordinates of the 3 -dimensional models 
of the receptor molecule and of the receptor molecule with 
the chemical substance docked therein; such software is 
available as standard commercial software, e.g. one of the 
many commercially available types of computer software packa- 

25 ges suitable for the purpose. 

It will be understood that complex calculations involved in 
molecular dynamics simulations render the manual determina- 
tion of the above-mentioned energy averages practically 
impossible, thus making the use of suitable computer soft- 
30 and hardware the preferred - and in practice necessary - 
choice. 

It is one of the main features of the present invention that 
it has been found possible to arrive at a very high degree of 
conformity with actual measured values on the basis of such a 
35 simple procedure. The parameter with which the non-polar and 
hydrophobic energy difference, AVj?£, is adjusted is suitably 
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a c efficient representing the result of a calibration estab- 
lished by comparing the results of predictions with actual 
measured values. The term "representing the result of a 
calibration" means that the parameter has been established by 
5 such calibration, or that the parameter has a value which 
would have resulted from such calibration against actual 
measured values; thus, it is not precluded that the parame- 
ter, although representing the results, could have been 
provided, in a particular case, in any other suitable manner. 

10 It is believed that the adjusted parameter may be valid for a . 
relatively broad range of systems, but when working in any 
particular system, it may be preferred to make a calibration 
against actual values measured on representatives of the 
system. Examples of such calibrations are given below. Thus, 

15 the calibration described in the below Example 1 was found to 
result in a value of the coefficient a of about 0.16. It is 
presumed that the numerical value of a will be at the most 
1.0, and that most values of a in practice will be at the 
most 0.5 or preferably at the most 0.3, such as at the most 

20 0.2. While these values are understood to be absolute values, 
it is believed that at will in fact be a positive value in 
most cases. 

The coefficient for the electronic term <AVjf ff > is predicted 
to be but may also be treated as an empirical parameter 

25 (0) determined by calibration against known data. It is then 
believed that the parameter ft will assume a value of at most 
1.0, and that most values of 0 in practice will be about 
0.2 - 0.8, such as about 0.3 - 0.7 or preferably about 
0.4 - 0.6. As stated above, 0 will in a most preferred em- 

30 bodiment of the invention have a value of about 0.5. While 
these values are understood to be absolute values, it is 
believed (as is the case for the coefficient of) that 0 will 
in fact be a positive value in most cases. 

In some systems, it seems suitable to add an additional 
35 constant term to Equation 1, so that the equation becomes 
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(2) 



where a is a coefficient representing the result of a cali- 
bration established by comparing the results of predictions 
with actual measured values (as described above) , and c is a 
5 constant reflecting extrapolation to zero size of the chemi- 
cal Bubstance, that is, where the regression line is dis- 
tinctly offset from origin when moving towards zero size of 
the chemical substance. The parameter c may also be used to 
correct for possible systematic errors due to e.g. the neg- 

10 lect of induced polarisation, possible force field deficien- 
cies etc. In these cases, c will normally assume a value 
between -10 and 10 kcal/mol, typically between -3 and 3 
kcal/mol, such as between -2 and 2 kcal/mol, e.g. between -1 
and 1 kcal/mol. However, it is anticipated that in many 

15 cases, c can suitably be set to zero, as the extent of devia- 
tion will be of minor importance for the usefulness of the 
predicted values. 

If also the electrostatic coefficient i treated as an empiri- 
cal parameter, the present approximation assumes its most 
20 general form, namely 



* where now both or, ft and c are to be determined by empirical 
calibration. 

While the solvent used in the above method is suitably and 
25 most often an aqueous solvent like water, it is within the 
scope of the invention to take any other suitable solvent as 
a starting point, including, e.g., methanol, ethanol, aceto- 
ne, acetonitrile, chloroform, hexane, etc., or mixtures 
thereof or combinations of such solvents or mixtures thereof 
30 with water. The selection of the solvent will be of little 
importance to the predicted values as long as the solvent is 
one which is able to dissolve or solvate the receptor molecu- 
le and the substance (in the present context this means that 



(2b) 



WO 95/06293 



PCMB94/00257 



14 

a sufficient amount of the receptor molecule can be homogene- 
ously mixed with the solvent without precipitation so as to 
allow the determination of binding energies by some suitable 
method) , but there may be cases where it is advantageous to 
5 modify the solvent environment (e*g. by modulating the ionic 
strength) in which the interaction of the substance and the 
receptor molecule is to take place. If the environment in 
which the interaction between the chemical substance, such as 
a drug, and a host or receptor molecule is to take place in 
10 the actual use of the drug is the human body, it might be 
particularly suitable to imitate e.g. human plasma as the 
solvent . 

While the above discussion has primarily dealt with cases 
where the absolute free energy of binding is determined, the 
15 method of the invention also makes it possible to determine 
relative values of free energy of binding between a number of 
chemical substances capable of interacting with a host or 
receptor molecule. 

As an example, the case may be considered where there are 
20 four inhibitors, I 1# I 2 » I3 and I 4 . For each inhibitor, a 
<AVfl B > and a <AVj^£> are calculated from molecular dynamics 
simulations, or Monte Carlo simulations, or simply by energy 
minimization. For instance, for any particular guess of the 
parameter a, it can then be seen how good this guess is by 
25 comparing the calculated (from Equation 1) and observed 
values of AG h y nrf for each inhibitor* In order to find the 
best Of, a least -squares optimization can be used, which means 
that the sum 



is calculated and minimized. The minimization can be obtained 



minimum. The best a obtained in this way is thus the or that 
gives the least discrepancy between calculated and observed 
values (in what is called the least-squares sense) . 




30 by varying at systematically so that this sum reaches its 
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Another approach is to determine ot analytically* Using for- 
mula 1 and given that the absolute binding energies for n 
different inhibitors are known, a can be determined by mini- 
mizing the following sum of squares (SS) expression: 

5 As stated above, this can be done by iterative methods, but 
it is also possible to arrive at an exact result by partial 
differentiation with respect to a of the sum SS: 

and equalling the partially differentiated sum to zero, which 
yields : 

£ ( 2 <A^>.(|<AV i ^> +a <AV 1 '*'>-A C ^)J=0 
t 

t(<^>^)-lt(<^>^<>) 

i-1 



10 Accordingly, the parameters a and j? and/or c can be determi- 
ned by comparing the AGg£§ values (from any of the equations 
lb, 2 and 2b) with AGgjgjj values and varying the parameters 
in the pertinent formula until a minimization of the sum of 
squares is obtained or by determining the parameters analyti- 

15 cally by partial differentiation with respect to the parame- 
ters of the least squares expression. In the case formula 2b 
is used and an analytical solution is sought, the partial 
differentiation of the least square expression with respect 
to a, P and c will e.g. result in three linear equations with 

20 the three parameters to be determined* 

The above -sketched procedures assume that there is access to 
the absolute observed values * Now, if for Borne 
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reason only the differences (relative values) of AG^ind are 
available for the four inhibitors, the actual AG^ind for each 
inhibitor would not be known, but the experimental data would 
consist of 
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There would thus be six experimental data points, each repre- 
senting a difference in binding energy. 

Again, for any guess of or (and 0) , the ^G bind values would be 
calculated (by one of the formulas 1 or 2) and their dif- 
15 ferences taken to obtain the MG^j values. Then, in the 
same manner as above, the sum 



would be minimized to obtain the optimum value of or (as well 
as ff) . Also in this case, the parameters a and 0 could be 
determined by iterative methods as well as analytically. 

20 These parameters are then the ones that gives the best agree- 
ment with respect to the relative binding energies (or bin- 
ding energy differences) . 

Accordingly, for a situation with n different inhibitors the 
sum to be minimized would then be 



25 -where N « #(n -n), because there are #(n 2 -n) differences in 
binding energy between inhibitors in a system with n inhibi- 




tors. 
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One could also imagine a case where, for some reason, the 
calculations give a systematic error in the absolute AG bind 
values. It might then be desirable to try to obtain the best 
fit of the relative energies instead, and the latter type of 
5 the above -described optimization methods would then be pre- 
ferable. 

It will be evident to the person skilled in the art that the 
method of the invention can be utilized in a number of ways 
to shorten drug development time and make molecular modelling 

10 more efficient * The most evident advantage seems to be that 
considerable synthesis efforts and thereby very considerable 
time- and resource -consuming activities can be saved on the 
way towards realistic drug candidates. On the other hand, as 
will be explained in greater detail herein, because the 

15 method of the invention involves an empirical element, the 
possibilities of performing relevant calibrations and ex* 
perimental confirmation should not be neglected where such 
calibration or confirmation is indicated. 

A couple of scenarios can be given to illustrate the possibi- 
20 lities provided by the invention: 

1) When considering a new system (a new raacromolecule for 
which it is desired to design a drug interacting with the 
macromolecule) , the initial stage could be to establish a 
possible lead compound by some means, e.g. computer 

25 modelling. The method of the invention could then be 

applied, for example using equation 1 with a value of a 
established earlier (for some other system, e.g. the 
value of a of about 0.16 disclosed herein) . If the out- 
come of the calculations according to the method of the 

30 invention would indicate that the binding of the lead 

compound is good enough, the lead compound could be 
synthesized, and its actual binding power could be 
measured. If the outcome of the calculations do not 
indicate that the binding of the lead compound is good 

35 enough, a next stage could be to try to modify the lead 
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compound "manually", using computer graphics, in order t 
improve the binding, and then perform the method of the 
invention on the modified compound; these steps could be 
repeated one or several times until a compound had been 
5 designed the data of which would show that the compound 

would be worth synthesizing. As mentioned above, it is 
not unlikely that even for a completely new system, a 
suitable drug candidate can be found without any recali- 
bration of a. 



10 2) If, after comparing the results of the method of the 

invention for a series of compounds with actual measured 
values for a new system, it is found that the agreement 
using an earlier established value of a is not satisfac- 
tory, a recalibration would be performed on the system in 

15 question, and then the further procedure would be as in 

1) above. 



3) If it is found that the calibration of a is not "univer- 
sal", a number of ot'B characterizing different systems 
could be established* It is contemplated that in such a 

20 case, a given <* would pertain to a certain class of 

(chemically similar) inhibitors or (less likely) a class 
of e.g. similar protein binding sites, or a combination 
of these two possibilities. A change of the solvent from 
e.g. water to methanol may change the optimal value of a. 

25 In any case, for a new system, one would then choose an 

earlier established value of of for a similar system. 



Of course, when using any of the formulas lb, 2, and 2b it is 
necessary to provide suitable values for both a and 0 and/or 
c, for example by the methods outlined above. 

30 While several other scenarios can be envisaged, it is 

believed that one important utilization of the method of the 
invention will be a method for identifying a chemical sub- 
stance capable of interacting with a host or receptor molecu- 
le, e.g. binding to the host or receptor molecule, with a 



WO 95/06293 



FCT/IB94/00257 



19 

predicted binding energy equal to or better than a predeter- 
mined threshold value, comprising 

1) choosing a chemical substance, A, which could potentially 
interact with the host or receptor molecule, and providing a 

5 3 -dimensional structural representation thereof, 

2) predicting the binding free energy between the chemical 
substance A and the host or receptor molecule by the method 
of the invention, 

10 3) if the predicted binding free energy between the chemical 
substance A and the host or receptor molecule determined is 
equal to or better than the predetermined threshold value, 
then identifying the chemical substance A as the chemical 
substance X, 

15 4) if the predicted binding free energy between the chemical 
substance A and the host or receptor molecule determined is 
not equal to or better than a predetermined threshold value, 
modifying the 3 -dimensional structural representation and 
predicting the binding free energy between the thus modified 

20 chemical substance, B, and the host or receptor molecule by 
the method of the invention, and 

S) if necessary repeating step 4 until the predicted binding 
free energy determined between the resulting chemical sub- 
stance, X, and the host or receptor molecule is equal to or 
25 better than the predetermined threshold value. 

One suitable way of calibrating the above-mentioned method 
could comprise providing a sample of a chemical substance or 
chemical substances selected from the chemical substances A, 
B and X and providing a sample of the host or receptor 
30 molecule, measuring the binding free energy between the 
chemical substance or substances and the host or receptor 
molecule, and if the measured binding free energy between the 
chemical substance or substances and the host or receptor 
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mol cule is not equal r substantially equal t the predicted 
value, then performing a calibration of the method according 
to the invention so as increase the predictive value of the 
method. 

5 LEGEND TO FIGURES 

Fig. 1. Illustration of how the approximation of equation b 
can be used to estimate the electrostatic contribution to 
AG eol in a given environment that can either be water or 
protein. The state A corresponds to a system with the isola- 

10 ted inhibitor in the gas -phase and a ready-made van der Waals 
cavity in the condensed system. State B is simply the solva- 
ted inhibitor in water or in the protein's active site. These 
states are given by the two potentials V A and V B . For any 
given configuration of the system, equation b will reduce to 

15 #Vff s since the solvent configuration in state A is uncorre- 
cted with the charge distribution of the solute in state B. 

Fig. 2. Calculated dependence of the average solute- solvent 
interaction energy on the length of the carbon chain for n- 
alkanes solvated in water. 

20 Fig. 3. Chemical structures of the inhibitors I 1# I 5 
used in the calculations in Example 1. 

Fig. 4. Stereo view of the 50 ps average MD structure of the 
EP-I X complex (thin lines) superimposed on the corresponding 
crystallographic structure (see Example 1) . The active site 
25 of the enzyme is shown with the bound inhibitor in the centre 
of the picture. 

EXAMPLE 1 

As a test system, endothiapepsin (EP) was chosen; it belongs 
to the family of aspartic proteinases (see, e.g., Davies, 
30 1990; Fruton, 1976), a class of enzymes for which numerous 
studies of inhibitor binding have been reported. Crystal 
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structures of native endothiapepsin and inhibitor complexes 
have been published by Blundell and coworkers (Foundling et 
al., 1987; Veerapandian et al., 1990). In the present 
Example, the crystal structure of endothiapepsin was used in 
5 complex with the inhibitor H218/54 (I lf Fig. 3), recently 
determined by Symbicom AB, as the structural starting point 
for the computations* Four other inhibitor compounds were 
used in this work and their chemical structures are also 
shown in Fig. 3 (I 2 -I 5 ) . Experimental binding data for the 
10 inhibitors have been obtained (see A.K. Falknas, Graduation 
report, University of Gothenburg, Sweden) . 

Computational details 

Starting from the experimental structure of the BP-^ com- 
plex, the other inhibitors were "manually" built into the EP 

15 active site using the FRODO (Jones, 1978) and Hydra (Hubbard, 
1984) graphics software. This model building was relatively 
straightforward since all the compounds contain the invariant 
hydroxyethylene transition state isostere adjacent to a 
peptide group, also present in the inhibitors studied by 

20 Blundell 's group (cf. above). After initial energy 

minimisation, Mp simulations consisting of 25 ps of equili- 
bration and 50 ps of data collection were performed for each 
of the inhibitors, solvated in water as well as bound to the 
solvated protein, 

25 The ENZYMIX programme (Warshel and Cr eight on, 1989) was used 
for all MD simulations together with the GROMOS potential 
(van Gunsteren and Berendsen, 1987) . This force field was 
revised by the present inventors with respect to the oxygen - 
extended carbon (CHq) interactions, since the standard GROMOS 

30 parameters were found from FEP simulations to give an in- 
correct value of AGbydc - +0.l±0.5 kcal/mol for methane. By 
changing the oxygen repulsive Lennard - Jones 6/12 parameter 
(AJ for O-CHq interactions from 421.0 to 793.3 
(kcal/mol )*& € , AG hy ^ r for methane becomes 2.5±0.4 kcal/mol, in 

35 better agreement with the experimental value of 2.0 kcal/mol 
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(Ben-Nairn and Marcus, 1984). This revision is quite important 
for a correct description of the hydrophobic effect. Also the 
Lennard-Jones parameters for charged carboxylate groups had 
been revised in order to better reproduce experimental solva- 
5 tion energies (Aqvist et al., 1993) J The present calculations 
used a spherical water "droplet" of radius 16 A for the 
simulations of inhibitors in solution and a corresponding 
16 A sphere containing both protein atoms and water in the 
simulations of bound inhibitors. The atoms within this sphere 

10 were free to move while protein atoms beyond 16 A were 

restrained to their crystal lographic positions. An interac- 
tion cutoff radius of 8 A was used and the MD time step was 
0*001 ps. The equilibration phase of the protein simulations 
consisted of 5 ps of successive heating of the system and 

15 weakening of harmonic positional restraints that were applied 
to the protein atoms. After this period all restraints within 
the 16 A sphere were set to zero and the system was further 
equilibrated at the final temperature of 298 K for another 
20 ps. The r.m.s. coordinate deviation for the inhibitor 

20 atoms between the following 50 ps MD average structure and 
the experimental ^B-l x complex is 0.94 A. Fig. 4 shows a 
superposition of these two structures, and it can be seen 
that the agreement is quite satisfactory. 

Results and discussion 

25 The first four inhibitors — , I 4 ) were used aB the 

calibration set in order to determine the optimum value of 
the parameter a and examine the success of eq. 1 in reprodu- 
cing the experimental results. Table 1 shows the observed and 
calculated absolute free energies of binding for the inhibi- 

30 tors l l# , I 4 using the value of a - 0.161 which optimizes 

the r.m.s. agreement with experiment. 
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Table 1 

Calculated and observed absolute binding free energies for 
inhibitors l x , . .., I 4 to endothiapepsin (in kcal/mol) . 





^ G calc 




5 1 


-11.16 


-10.69 


2 


-8.23 


-7.93 


3 


-11.14 


-11.67 


4 


-6.29 


-6.57 



The above-mentioned value, 0.161, of ot gives a mean unsigned 
10 error of 0.39 kcal/mol for the calibration set and the 
largest error is 0.53 kcal/mol for inhibitor I 3 . Such an 
accurate fit had not been expected for the absolute binding 
energies; the initial main purpose was to be able to apply 
and calibrate equation 1 with respect to the relative ones. 
15 If the calibration is instead done using relative binding 
energies, one obtains a very similar value of a (0.169) with 
mean unsigned errors that are virtually identical to those 
above. This result is quite remarkable and thus lends further 
support to the method of the invention, embodied in the 
20 approximation of equation 1. The calculated and observed 

relative free energies of binding (using a » 0.161) are shown 
in Table 2 and the average unsigned error is 0.59 kcal/mol. 

Table 2 

Calculated and observed relative binding free energies for 
25 Inhibitors I lf I 4 to endothiapepsin (in kcal/mol). 



I x -Iy MG^c AAGobsd 
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+2.91 
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-1.95 
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3-4 


-4.86 


-5.10 
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The largest error here is 0.99 kcal/mol for the Ix/*3 selec- 
tivity, but all other relative free energies have the correct 
sign and are within 0.8 kcal/mol of the experimental result. 
It is particularly interesting to note that the calculations 
5 were able to discriminate the low-affinity inhibitor I 4 quite 
well from the high-affinity ones (I x and I 3 ) . 

As the above results were encouraging, the predictive power 
of the approach was tested by modelling an inhibitor not 
present in the calibration set. Por this, the inhibitor I s 

10 was chosen; as can be seen from Fig. 3, this inhibitor dif- 
fers significantly in its chemical structure from any member 
of the calibration set. This molecule was built into the BP 
active site and subjected to the same simulation procedure as 
the other inhibitors. The predicted absolute free energy of 

15 binding for I 5 is -9.70 kcal/mol which is in excellent agree- 
ment with the corresponding observed result of AG MfTrf (I s ) « 
-9*84 kcal/mol. The calculated relative binding free energies 
with respect to the four inhibitors in the calibration set 
are given in Table 3, where it can be seen that all the 

20 pairwise selectivities involving X 5 are correctly predicted 
by the simulations, the maximum error in this case being 0*61 
kcal/mol . 

Table 3 

Calculated and observed relative binding fee energies in- 
25 volving inhibitor I 5 (in kcal/mol) . 



30 
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The above results indicate that the simple linear approxima- 
tion of the free energy according to the method of the in- 
vention is able to describe the main physics (electrostatic 
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(polar) and hydrophobic (non-polar) interactions) of the 
binding process in a satisfactory way. 

It is, of course, important to try to identify and separate 
the different types of errors involved in the present method 
5 and suggest how they could possibly be dealt with. There are 
basically four sources of errors, namely (i) inaccurate 
starting structures due to incorrect model building, (ii) 
possible deficiencies in potential energy functions, (iii) 
poor MD convergence, e.g., due to short trajectories, and 
10 (iv) the approximation of eq. 1 itself. The only remedy for 
errors of the first type is to obtain as much structural 
information as possible, and in order to assess their magni- 
tude it would be desirable to carry out calculations on a set 
of inhibitor complexes whose 3-D structures all have been 
15 experimentally determined. Although MM/MD potentials are 

continuously being refined by many research groups, errors of 
the second type may still be considerable. Thus, for example, 
it is not yet entirely clear how well customary protein force 
fields actually reproduce relevant energetic properties. In 
20 the present case, it can be noted that the force field used 
above required some revision in order to reproduce essential 
solvation properties (see above) . 

The convergence properties of MO stimulations depend on how 
far from equilibrium the initial structure is, but judging 
from the present Example, it seems that one can reach satis- 
factory convergence within reasonable computing time. For 
example, by comparing averages over the first and second 
halves of the MD trajectories, average (over all five in- 
hibitors) errors were obtained of ±0.35 and ±0.75 kcal/mol 
for Vj*£ and Vfl g , respectively, in the protein; the corre- 
sponding errors in water are ±0.46 and ±0.62 kcal/mol. This 
would yield a nominal error range of ±0.82 kcal/mol in equa- 
tion 1 originating from the MD convergence uncertainty. Here, 
it can be noted that it seems to be an important advantage of 
the method of the present invention that it focuses on simu- 
lation of the thermodynamically relevant states compared, 
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e.g., to Free Energy Perturbation calculations where most of 
the computing time is spent on the paths between such states. 
The errors associated with the approximation of eq. 1 are 
difficult to estimate quantitatively although the agreement 
5 with experimental binding data obtained here indicates that 
the linear approximation is reasonable. A larger calibration 
set could obviously be desirable, but this is mainly a matter 
of refinement of this particular embodiment of the main 
principle of the invention. As mentioned above, Ben- Nairn and 

10 Marcus (1984) have shown for several classes of hydrocarbon 
chain containing organic compounds that a linear fit of &G soJ 
* Jen + 2, where n is the number of carbon atoms in the chain, 
is quite accurate both in water and non-polar solvents. It 
can, however, be noted that in some cases the extrapolation, 

15 1, of AG fiol to zero chain length is non-zero (e.g., for n- 
alkanes in water 1 a 1.42 kcal/mol) . As discussed further 
above, this might suggest that an additional constant term 
should be added to eq. 1, reflecting a difference in extrapo- 
lation to "zero inhibitor size" (between the water and pro- 

20 tein environments) of the non-polar contribution to AG soI . 
It is also important to emphasize that equation 1 with the 
above parameterisation of a is not an equation for the in- 
dividual solvation energy terms, since the factor ot repre- 
sents the combined effect of several energy/free energy 

25 relations. The fact that solvation energy differences are 
always dealt with may also cause some cancellation of pos- 
sible systematic errors. Furthermore/ due to the larger 
weight given by equation l to the V%\ B term, the formula 
might be expected to give better results for polar inhibi- 

30 tors, as long as the electrostatic linear response approxima- 
tion holds. However, for inhibitors carrying a net charge, 
long range electrostatic effects as well as induced polarisa- 
tion effects are known to be important, but these problems 
have not yet been resolved in most available MD programs and 

35 force fields. When comparing inhibitors of different charge 
states long-range corrections of the Born type (see, e.g., 
Straatsma and Berendsen, 1988; Aqvist, 1990) would obviously 
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become important and may make accurate predictions more difficult. 

As mentioned further above, it is possible that the empirical 
parameter a is readily transferable to other systems, but it 
is also possible that it will display some system dependency. 
5 Given the fact that the parameterisation of force fields can 
differ considerably, d may be found to be force field speci- 
fic. 

However, the above detailed considerations concerning pos- 
sible refinements of the specific embodiments disclosed 

10 herein do not detract from the very important fact that the 
new overall principle according to the present invention is a 
very considerable advance in the art of computational chemis- 
try and molecular modelling, making it possible to perform 
drug design tasks with considerably less resource consumption 

15 and in considerably shorter time than hitherto. 

EXAMPLE 2 

The aspartic proteinase from the human immunodeficiency virus 
type 1 (HIV-1) is the target of intense AIDS drug develop- 
ment. This enzyme and three of its inhibitors were chosen as 
20 a second test system (Hans son, 1994) . 

Computational details 

Starting coordinates were from the 5HVP entry in the Brook- 
haven Protein Data Bank (Fitzgerald et al. 1990), describing 
a complex of the proteinase with the inhibitor acetylpepsta- 
25 tin, designated Inhibitor 1 in this example. 

Most charges of the protein were turned off, by replacing 
charged groups by dipoles with zero total charge. Charges 
left on were as follows (numbering of 5HVP) . 
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The 27 water molecules closest to the active site of the 
protein were retained from the X-ray coordinates file. Ac- 
tually only one water molecule (water 308H in 5HVP) is very 
near the centre, and the rest form the proximal parts of the 

20 water shell around the protein. This central water (308H) is 
in a conserved position for a water molecule in all early 
structures reported for HIV-1 proteinase (Fitzgerald and 
Springer 1991) - It was retained in the calculations for 
Inhibitor 1 and Inhibitor 2, but was removed in the calcula- 

25 tions for Inhibitor 3 due to the fact that Inhibitor 3 is 
engineered to mimic this water molecule with its protruding 
carbonyl oxygen. 

Inhibitor 1: acetyl -peps tatin 

This is the N-capped oligopeptide Ace-Val-Val-Sta-Ala-Sta, 
30 where Sta is the rare amino acid residue statine (Sta, 

(45,35) 4 -amino 3 -hydroxy- 6 -methyl -heptanoic acid), and Ace 



W 95/06293 



PCT/IB94/00257 



29 

is an acetyl group: CH 3 (CO) (Richards et al. f 1989).. Co rdi- 
nates for Inhibitor 1 were taken from 5HVP. The C- terminal 
carboxyl group is negatively charged. 

Inhibitor 2t peps tat in 

5 This is the N-capped oligopeptide Iva-Val-Val-Sta-Ala-Sta, 
where Iva is an isovaleryl group; (CH3) 2 CHCH 2 (CO) . The only 
difference from Inhibitor 1 is the added group of three 
nonpolar carbon atoms (isopropyl group) in the beginning of 
the molecule (Dreyer et al. 1989) ♦ Inhibitor 2 was modeled 
10 from inhibitor 1 by adding the 3 carbon atoms, and rotating 
around the bonds at the end of the molecule to remove any 
collisions between atoms. 

Inhibitor 3: BMP 323 

This is the substituted seven-membered cyclic urea DMP 323 
15 (Lam et al., 1994) , developed by the DuPont Merck Pharmaceu- 
tical Company. The coordinates for the complex of Inhibitor 3 
and the proteinase had not yet been deposited into the Pro- 
tein Data Bank. Therefore this inhibitor was model built: 
simulated annealing was used to determine a low energy struc- 
20 ture of the inhibitor and this was modelled into the active 
site of the proteinase (5HVP coordinates) . 

Simulation parameters 

A sphere of radius 20 A of water was added, and any protein 
atoms outside this sphere were kept fixed. The time step was 
25 1 fs, except for same equilibration steps where noted (2 f s) . 
The temperature was 300 K. 

The cutoff distance was 15 A in the first set of calculations 
for all three inhibitors. Then, Inhibitor 3 was tested with 
15 A, 10 A and 8 A. Tests of the turning off of charges were 
30 performed using 10 A, as were the pH tests described below. A 
10 A cutoff was used for the FEP calculations where Inhibitor 
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2 was changed into Inhibitor 1. These cutoffs apply to inter- 
actions between protein groups of zero net charge. Charged 
protein groups, and all parts of the inhibitor, interact with 
all other parts of the simulated system, without cutoffs. 

5 The molecular dynamics simulations were performed using the 
program ENZYMIX and the GROMOS potential, with modifications 
as in Example 1. 

Calculation of binding energies 

The three inhibitors were simulated, with a 15 A cutoff, 
10 until values were reasonably stable. The protein was elec- 
trically neutral ('Neut'). The approximation of equation (1) 
was applied, with ct « 0.161 taken from Example 1. 

Inhibitor 1: 

AGbind ~ <(-255.84)-(-255.19))+0.16X- { (-89 .13) - (-47.29) ) 
15 - (-0.32)+(-6.74) » -7.1 kcal/mol 

Inhibitor 2: 

AGfejjjj ~ ((-233.01) -(-234.99)) +0.161- ( (-100 .26) - (-55 . 08) ) 
= (+0.98)+(-7.27) = -6.3 kcal/mol 

Inhibitor 3: 

20 &G biad - tf- ((-75.02)-(-69.14))+0.161- ( (-89.86) - (-43 .39) ) 
= (-2.94)+(-7.42) - -10*4 kcal/mol 
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Tabl 4 



I Inhibitor 1: Acetyl -pepstatin 


15 A 




time 






vf** 






ps 


kcal/mol 


kcal/mol 


kcal/mol 


(Heut) : 


Brni*t 1 -J Kt"» t" ^ on 


110 .2 










Data collection 


25.0 


-89.05 


-256.02 


-6073.69 






25.0 


-91.47 


-253.48 


-6063.06 






25.0 


-89.50 


-251.68 


-6037.51 






25.0 


-86.00 


-264.39 


•6029.77 






7.6 


-87.38 


-267.00 


-6012.86 






17.4 


-90.63 


-247.79 


-6026.66 




Average 


125.0 


-89.13 


-255,84 




Water: 


Equilibration 


130.0 


(2fs) 








Data collection 


25.0 


-46.26 


-257.22 


-10492.30 






25.0 


-47.65 


-251.52 


-10512.25 






25.0 


-47.53 


-251.01 


-10502.57 






25.0 


-47.66 


-261.85 


-10517.91 






25.0 


-47.36 


-254.38 


-10519.44 




Average 


125.0 


-47.29 


-255.19 
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Table 5 



1 Inhibitor 2: 


Pepstatin 






15 A 




time 
ps 


kcal/mol 


kcal/mol 


kcal/mol 


J PROTEIN 
(Neut) : 


Equilibration 


85.2 










Data collection 


25.0 


-103.41 


-225.95 


-6078.87 






25.0 


-100.77 


-231.47 


-6161.13 


1 




25.0 


-98.90 


-240.61 


-6168.42 






25.0 


-99.59 


-237.60 


-6149.30 






25.0 


-98.61 


-229.40 


-6126.96 




Average 


125.0 


-100.26 


-233.01 




Water: 


Equilibration 


138.0 


(2f8) 








Data collection 


25.0 


-54.51 


-237.80 


-10471.33 






25.0 


-55.44 


-234.92 


-10472.92 






25.0 


•55.99 


-224.29 


-10469.69 






25.0 


-55.94 


-235.67 


-10470.89 






25.0 


-53.54 


-242.29 


-10478.81 




Average 


125.0 


-55.08 


-234.99 
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Tabl 6 





Inhibitor 3 












time 
ps 


kcal/mol 


kcal/mol 


If* 

kcal/mol 


PROTBIN 
(Neut) : 


Equilibration 


85.2 










Data collection 


25.0 


-89.39 


*76.42 


-5914.07 






25.0 


-89.60 


-74.59 


-5919.20 






9.8 


-90 .05 


-72 .76 


-5919 .46 






15.2 


-88.96 


-74.87 


-5916.97 




Average 


75.0 


-89.46 


-75.02 




Water: 


Equilibration 


50.0 










Data collection 


25.0 


-43.51 


-67.88 


-10405.85 






25.0 


-43.99 


-70.02 


-10417.31 






25.0 


-42.67 


-69.53 


-10412.99 




Average 


75.0 


-43.39 


-69.14 





Cutoff effect 

For Inhibitor 3, three different cutoff distances were tried, 
15 A, 10 A And 8 A. Of these, the first two converged pro - 

10 perly, and gave similar results. The a A run did not con- 
verge, and the coordinates showed that some charged amino 
acids had taken on a very different conformation, exposing 
themselves much more to the surrounding water. This demon - 
strates the effect of overpolarization of water, where a 

15 short cutoff gives the water overly polar properties, making 
it too energetically favourable for charged groups to be 
exposed to water. The following result was obtained for the 
10 A run. 



W 95/06293 



PCT/IB94/0O257 



34 

Table 7 



p 1 — 1 — " 


xnniDitor 


3: DMP 323; 10 k Cutoff 




| 10 A 




tine 


kcal/mol 


kcal/mol 


kcal/mol 


PROTEIN 
(Weut) : 


Equilibration 


bU 










Data collection 


25.0 


-85,76 


-76.64 


-5909.37 






25.0 


-85.96 


-78.20 


-5896.49 






25.0 


-85.73 


-77.56 


-5905.23 




Average 


75.0 


-85.82 


-77.47 




Water: 


Equilibration 


50.0 










Data collection 


21.5 


-42.88 


-72.60 


-10423.32 






3.5 


-44.33 


-60.54 


-10420.80 






25.0 


-43.78 


-70.17 


-10437*82 






25.0 


-43.82 


-65.82 


-10435.30 




Average 


75.0 


-43.56 


-68.97 





Inhibitor 3: 10 A 

AG biod " * ((-77.47) -(-68.97)) +0.161- ( (-85.82) - (-43 .56) ) 
« (-4.25)+<-6.80) = -11.0 kcal/mol 

10 The result was sufficiently similar to justify using 10 A 
cutoffs in the following simulations. 

Electrical effects 

The amino acid residues far from the inhibitor had their 
charges turned off. On the other hand, charges close to the 
15 inhibitor were left on. 

To investigate the effects of turning off charges of amino 
acid residues at 'middle distance' from the inhibitor , a 
different configuration was tested, where Glu 34, Glu 234, 
Lys 45 and Lys 245 were turned off. The same water simulation 
20 is used for both protein configurations. Therefor , relative 
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effects of charges to the protein can be studied by comparing 
protein simulations only. 

Table 8 



Inhibitor 3; BMP 323; 10 A cutoff 



I io A 




time 

PB 


kcal/mol 


V 1 

kcal/mol 


V" J 
kcal/mol I 


| PROTEIN 
(Neut) : 


Equilibration 


60.2 










Data collection 


25.0 


-87.51 


-80.52 


-5767.96 1 






25.0 


-88.97 


-73.78 


-5791.21 I 






24.0 


-89.25 


-77.71 


-5783.69 I 




Average 


74.0 


-88.57 


-77.33 




Water: 


Equilibration 


60.2 










Data collection 


25.0 


-85.76 


-76.64 


-5902.37 






25.0 


-85.96 


-78.20 


-5896.49 






25.0 


-85.73 


-77.56 


-5905.23 




Average 


75.0 


-85.82 


-77.47 





The electrical term almost did not change. The result justi- 
10 fies using the 'Cent"' charges for other calculations on 
small uncharged inhibitors like Inhibitor 3. These four 
peripheral charges where turned off in the following pH 
studies . 

Effects of pE 

15 The binding of inhibitors to HIV-1 proteinase is pH dependent 
(Hyland et al., 1991, Richards et al. f 1989). Generally, pH 
below 6 gives better binding than pH above 7. This may stem 
from pH- induced conformational changes, but the protonation 
state of the two active site aspartates should also be in- 

20 volved. These have a pK a split from sitting close to one 
another and therefore have pK^'a in the ranges 3.1-3.7 and 
5.2-6.5, respectively (Hyland et. al, 1991). 
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To test the effect of protonation of the active site .Asp 25 
and Asp 225 on binding of Inhibitor 3, the two protein states 
'Cent 2 "' and 'CentH 2 ' were compared to the 'Cent"' calcula- 
tions. The same water can be used. 

5 Table 9 



Inhibitor 3: BMP 323; protonation of central Asp's 



I 10 k 




time 
ps 


kcal/mol 


kcal/mol 


kcal/mol 


I PROTEIN 

| (Neut) : 


Equilibration 


60.2 










Data collection 


25.0 
25.0 
24.0 


-87.51 
-88.97 
-89.25 


-80.52 
-73.78 
-77.71 


-5767.96 
-5791.21 
-5783.69 




Average 


74.0 


-88.57 


-77.33 




| Water: 


Equilibration 


£0.2 










Data collection 


25.0 
25.0 
25.0 


-86.40 
-86.81 
-87.44 


-74.00 
•75.48 
-75.54 


-5788.97 
-5789.14 
-5817.64 




Average 


75.0 


•86.88 


-75.01 




PROTEIN 
(CentH^): 


Equilibration 


60.2 










Data collection 


25.0 


-90.00 


-61.93 


-5658.56 




Average 


25.0 


-90.00 


-61.93 





These calculations predict low binding affinity at very low 
pH when both aspartates are protonated CCentH 2 '). The best 
15 binding is found in the singly protonated state ('Cent - '), 
but the doubly negative (high pH) state ('Cent 2 *') is predic- 
ted to give fair binding also. 

Free Energy Perturbation 

A classical free energy perturbation was performed, changing 
20 Inhibitor 2 into Inhibitor 1, by removing three carbon atoms. 
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First the atoms were shrunk, while atom distances were 
constrained to normal bond lengths. 

Then the atoms were pulled in to 20% of their normal 
bond length, while at the same time shrinking them 
5 more. This means pulling the three now small atoms 

inside the end- carbon of Inhibitor 1. 

At last the atoms were allowed to disappear complete* 

iy. 

The protein was in the 'Neut' electric state* Protein Simula- 
10 tions started from the end of the 210 ps run reported above 
for inhibitor 2, and the water simulations started from the 
end of the corresponding 263 ps run. 



Two calculations were run, with l ps and 2 ps per X- point. A 
total of 68 X -points were used for the protein and 61 for the 

15 water calculation. The first step of the three took more 
points to converge in the protein (40) than in water (33) . 
The second two steps were book -kept together, 28 points in 
either case. The first 25% of the data collected from each 
point was discarded. Errors in the table below (table 10) are 

20 hysteresis differences. 
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Table 10 



1 pa/step 


PROTEIN 


WATER 


Inhibitor 2 
4 

Intermediate 

4 

Inhibitor 1 


40 x 0.75 pS 
28 x 0.75 pS 


-0.94 ± 0.16 
-8.69 ± 0.08 


33 x 0.75 ps 
28 x 0.75 ps 


0.09 * 0.20 
-9.21 ± 0.14 


TOTAL: 


68 x 0.75 ps 


-9.63 ± 0.24 


61 x 0.75 ps 


-9.12 ± 0.34 


DIFFERENCE: 


-0.51 ± 0.58 


2 ps/step 


PROTEIN 


WATER 


Inhibitor 2 

4 

Intermediate 
4 

Inhibitor l 


40 x 1.50 ps 
28 x 1.50 ps 


0.27 ± 0.15 
-8.55 ± 0.14 


33 x 1.50 ps 
28 x 1.50 ps 


-0.52 ± 0.16 
-7.06 ± 0.32 


TOTAL: 


68 x 1.50 ps 


-8.28 ± 0.29 


61 x 1.50 ps 


-7.58 ± 0.48 


DIFFERENCE: 


-0.70 ± 0.77 



The result of the calculation is that Inhibitor 1 hinds 0.7 
kcal/mol tighter than Inhibitor 2. There is a difference 
20 between 1 ps and 2 ps calculations, but not so large as to 
make an even longer calculation worthwhile. 

Agreement between methods 

The values calculated by equation (1) were consistent with 
the free energy perturbation. The method of the invention 
25 predicts that Inhibitor 1 will bind 0.8 kcal/mol better than 
Inhibitor 2. The FEP calculation gibes 0.7 Jccal/mol - an 
excellent agreement. 

Comparison with experimental data 

Experimental values (Dreyer et al., 1989, Lam et al. r 1994, 
30 Richards et al., 1989) on the binding of the three inhibitors 
are: 
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Table 11 



5 



Inhibitor 


pH 


T 
K 


xflS 


kcal/mol 


kcal/mol 


Ac eofc 
kcal/mol 


kcal/mol 


1 


5.0 


303 


35 


-10.2 


-10.2 


-7.1 


3.1 


2 


6.0 


310 


1400 


-8.0 


-8.7 


-6.3 


2.4 


3 


5.5 


298 


0.27 


-13.1 


-13.1 


-10.4 
-11.0 


2.7 
2.1 



^gagusted is experimental value, adjusted if the measure- 
ment was made at high pH by adding the free energy needed to 
reach the singly protonated form (a pKa of 5.5 for the upper 
aspartate is assumed) . The rationale behind this adjustment 
10 is the reported {Richards et al., 1989) very poor binding of 
Inhibitor 1 at pH 7.0, suggesting that also Inhibitor 2 binds 
to the singly protonated form of the protein. 

The approximation of equation (l) gives the same relative 
order of binding strength for the three inhibitors as ex- 
15 perimentally observed: 

Inhibitor 3 > Inhibitor 1 > Inhibitor 2 

where '>' means 'binds better than'. The FEP calculation also 
gives the proper relation between inhibitors 1 and 2. 

The approximate calculations consistently lack 2 to 3 
20 kcal/mol. This error could of course be minimized by determi- 
nation of the optimum value of c as described above. 
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CLAIMS 

1. A method for predicting the binding free energy between a 
host or receptor molecule and a chemical substance bound 
thereto, comprising 

5 assessing the average energy difference, <AVf* s >, defined as 
<Vff s >B - < v i-s > B' between the contribution from polar inter- 
actions to the potential energy between the chemical subs tan - 
ce (denoted i) and its surroundings (denoted s) in two sta- 
tes, one state (A) being where the chemical substance is 
10 surrounded by solvent, the other state (B) being where the 
chemical substance, bound to the host or receptor molecule, 
is surrounded by solvent, 

assessing the average energy difference, <Avjfj>, defined as 
<v i?s > B " <v i?s > A' between the contribution from non- polar 

15 interactions to the potential energy between the chemical 

substance (denoted i) and its surroundings (denoted s) in two 
states, one state (A) being where the chemical substance is 
surrounded by solvent, the other state (B) being where the 
chemical substance, bound to the host or receptor molecule, 

20 is surrounded by solvent, 

and calculating the absolute binding free energy as an 
adjusted combination of the two above-mentioned average 
energy differences, 

2. A method according to claim 1, wherein the assessment of 
25 at least one of the average energy differences is performed 

by establishing a 3 -dimensional representation comprising a 
3 -dimensional structure of the host or receptor molecule and 
a 3 -dimensional structure of the chemical substance docked 
therein and applying molecular dynamics calculations to the 
30 3 -dimensional presentation. 

3 . A method according to claim 1 , wherein the assessment of 
both of the average energy differences is performed by estab- 
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lishing a 3 -dimensional representation comprising a 3 -dimen- 
sional structure of the host or receptor molecule and a 3- 
dimensional structure of the chemical substance docked there- 
in and applying molecular dynamics calculations to the 3- 
5 dimensional representation. 

4. A method according to any of the preceding claims , wherein 
the free energy of binding is predicted as 



where a and 0 are coefficients representing the result of a 
10 calibration established by comparing the results of predic- 
tions with actual measured values, and c is a constant 
reflecting extrapolation to zero size of the chemical sub- 
stance or reflecting correction for possible systematic 
errors . 

15 5. A method according to claim 4, wherein the free energy of 
binding is predicted as 



6. A method according to claim 4 or 5, wherein the absolute 
value of a is at the most 1.0, preferably at the most 0.3. 

20 7. A method according to claim 6, wherein the absolute value 
of a is at the most 0.2, 

8. A method according to claim 7, wherein the absolute value 
of of is about 0.16. 

9. A method according to claim 8, wherein the value of a is 
25 about 0.16. 



AG bind - fi<AVfi 8 > + <X<AVY*> + c 



(2b) 



(lb) 



10. A method according to any of claims 4-9, wherein the 
absolute value of & is at the most 1.0, preferably about 
0.2 - 0.8. 
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11. A method according to claim 10, wherein the absolute 
value of 0 is about 0.3 - 0.7, preferably about 0.4 - 0.6. 

12. A method according to claim 11, wherein the absolute 
value of & is about 0.5. 

5 13 . A method according to claim 12 , wherein the value of 0 is 
about 0*5. 

14. A method according to claim 13, wherein the free energy 
of binding is predicted as 

AG blnd « *<Av£ fl > + ot<Av?%> + c (2) 

10 15. A method according to claim 14, wherein the free energy 
of binding is predicted as 

- tf<AVf* s > + a<AV^> (1) 

16. A method according to any of claims 4 or 6-14 wherein c 
is in the range -10 to 10 kcal/mol. 

15 17. A method according to claim 16, wherein c is in the range 
of -3 to 3 kcal/mol. 

18. A method according to claim 17, wherein c is in the range 
of -2 to 2 kcal/mol. 

19. A method according to claim 18, wherein c is in the range 
20 of -1 to 1 kcal/mol. 

20. A method according to any of the preceding claims, where- 
in the solvent is selected from organic aromatic and non- 
aramatic organic solvents, inorganic solvents and mixtures 
thereof . 
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21. A method according to claim 20, wherein the solvent is 
©elected from methanol , ethanol, acetone, acetonitrile, 
chloroform, hexane, water and mixtures thereof. 

22. A method according to any of claims 1-19, wherein the 
5 solvent is an aqueous solvent. 

23. A method according to claim 22, wherein the solvent is 
water . 

24. A method for identifying a chemical substance capable of 
interacting with a host or receptor molecule, e.g. binding to 

10 the host or receptor molecule, with a predicted binding 
energy equal to or better than a predetermined threshold 
value, comprising 

1) choosing a chemical substance. A, which could potentially 
interact with the host or receptor molecule, and providing a 

15 3 -dimensional structural representation thereof, 

2) predicting the binding free energy between the chemical 
substance A and the host or receptor molecule by the method 
according to any of claims 1-23, 

20 3) if the predicted binding free energy between the chemical 
substance A and the host or receptor molecule determined is 
equal to or better than the predetermined threshold value, 
then identifying the chemical substance A as the chemical 
substance X, 

25 4) if the predicted binding free energy between the chemical 
substance A and the host or receptor molecule determined is 
not equal to or better than a predetermined threshold value, 
modifying the 3 -dimensional structural representation and 
predicting the binding free energy between the thus modified 

30 chemical substance, B, and the host or receptor molecule by 
the method according to any of claims 1-23, and 
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5) if nec ssary repeating step 4 until th predict d binding 
free energy determined between the resulting chemical sub- 
stance, X, and the host or receptor molecule is equal to or 
better than the predetermined threshold value. 

5 25. A method according to claim 24, further comprising pro- 
viding a sample of a chemical substance or chemical substan- 
ces selected from the chemical substances A, B and X and 
providing a sample of the host or receptor molecule, mea- 
suring the binding free energy between the chemical substance 

10 or substance and the host or receptor molecule, and if the 
measured binding free energy between the chemical substance 
or substances and the host or receptor molecule is not equal 
or substantially equal to the predicted value, then 
performing a calibration of the method according to any of 

15 claims 1-23 so as increase the predictive value of the 
method. 

26. A method for providing a chemical substance capable of 
interacting with a host or receptor molecule, e.g. binding to 
the host or receptor molecule, with a binding energy equal to 

20 or better than a predetermined threshold value, comprising 
performing the method according to claim 23 or 24 to identify 
a chemical substance X having a predicted binding free energy 
equal to or better than the predetermined threshold value, 
providing a sample of the chemical substance X and a sample 

25 of the host or receptor molecule and measuring the binding 
free energy between the chemical substance X and the host or 
receptor molecule, and establishing that the measured binding 
free energy between the chemical substance X and the host or 
receptor molecule is equal to or better than the predetermi- 

30 ned value. 
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